Morphological Decomposition for Asr in German

نویسندگان

  • Martine Adda-Decker
  • Gilles Adda
چکیده

In this contribution we report on our ongoing work in lexical decomposition for automatic speech recognition (ASR). Lexical decomposition is investigated with a twofold goal: lexical coverage optimization and improved automatic letter-tosound conversion. Whereas morphological decomposition is a widely-studied domain in linguistics, our interest is limited here to identifying and processing the statistically most relevant sources of lexical variation in text corpora. Lexical variation is shown to be particularly important for nouns, due to compounding. A set of about 340 decomposition rules has been developed using statistics from 300M words from different newspaper sources (primarily 14 years from the TAZ, the Berliner TAgesZeitung). The out-of-vocabulary (OOV) rate on the same 300M words is reduced from 5.2 to 4.6% in case-sensitive form and to 4.2% in case-insensitive form. For letter-to-sound conversion cross-morpheme letter sequences are a major source of ambiguity. Decomposition, by reducing these ambiguities, contributes to producing more consistent phonemic transcriptions for pronunciation dictionaries.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Effect of Raising Morphological Decomposition Awareness on Lexical Knowledge of Complex English Words

Lexical knowledge of complex English words is an important part of language skills and crucial for fluent language use. This study aimed to assess the role of morphological decomposition awareness as a vocabulary learning strategy on learners’ productive and receptive recall and recognition of complex English words. University students majoring English at the...

متن کامل

A Fault Diagnosis Method for Automaton based on Morphological Component Analysis and Ensemble Empirical Mode Decomposition

In the fault diagnosis of automaton, the vibration signal presents non-stationary and non-periodic, which make it difficult to extract the fault features. To solve this problem, an automaton fault diagnosis method based on morphological component analysis (MCA) and ensemble empirical mode decomposition (EEMD) was proposed. Based on the advantages of the morphological component analysis method i...

متن کامل

Edinburgh SLT and MT System Description for the IWSLT 2013 Evaluation

This paper gives a description of the University of Edinburgh’s (UEDIN) systems for IWSLT 2013. We participated in all the MT tracks and the German-to-English and Englishto-French SLT tracks. Our SLT submissions experimented with including ASR uncertainty into the decoding process via confusion networks, and looked at different ways of punctuating ASR output. Our MT submissions are mainly based...

متن کامل

Automated closed captioning for Russian live broadcasting

The paper describes a hardware-software system for real-time closed captioning of Russian live TV broadcasts. The use of respeaking technology enabled us to create an ASR system with WER not exceeding 5.5%. Editing closed captions in real time further reduces WER down to 0.2%. In the paper we report some advancements in LMs for a highly inflected language and also in using morphological rescori...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001